Overview

Dataset statistics

Number of variables29
Number of observations556205
Missing cells452079
Missing cells (%)2.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory123.1 MiB
Average record size in memory232.0 B

Variable types

Numeric12
Unsupported1
Categorical14
DateTime2

Alerts

CRASH TIME has a high cardinality: 1440 distinct values High cardinality
LOCATION has a high cardinality: 122234 distinct values High cardinality
CONTRIBUTING FACTOR VEHICLE 1 has a high cardinality: 55 distinct values High cardinality
VEHICLE TYPE CODE 1 has a high cardinality: 839 distinct values High cardinality
VEHICLE TYPE CODE 2 has a high cardinality: 890 distinct values High cardinality
df_index is highly correlated with COLLISION_ID and 1 other fieldsHigh correlation
NUMBER OF PERSONS INJURED is highly correlated with NUMBER OF MOTORIST INJUREDHigh correlation
NUMBER OF PERSONS KILLED is highly correlated with NUMBER OF PEDESTRIANS KILLED and 1 other fieldsHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF MOTORIST INJURED is highly correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
COLLISION_ID is highly correlated with df_index and 1 other fieldsHigh correlation
Year is highly correlated with df_index and 1 other fieldsHigh correlation
DayOfWeekNumber is highly correlated with hourofweekHigh correlation
hourofday is highly correlated with timeofdaypercentHigh correlation
timeofdaypercent is highly correlated with hourofdayHigh correlation
hourofweek is highly correlated with DayOfWeekNumberHigh correlation
df_index is highly correlated with COLLISION_ID and 1 other fieldsHigh correlation
LATITUDE is highly correlated with LONGITUDEHigh correlation
LONGITUDE is highly correlated with LATITUDEHigh correlation
NUMBER OF PERSONS INJURED is highly correlated with NUMBER OF MOTORIST INJUREDHigh correlation
NUMBER OF PERSONS KILLED is highly correlated with NUMBER OF PEDESTRIANS KILLED and 1 other fieldsHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF MOTORIST INJURED is highly correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
COLLISION_ID is highly correlated with df_index and 1 other fieldsHigh correlation
Year is highly correlated with df_index and 1 other fieldsHigh correlation
DayOfWeekNumber is highly correlated with hourofweekHigh correlation
hourofday is highly correlated with timeofdaypercentHigh correlation
timeofdaypercent is highly correlated with hourofdayHigh correlation
hourofweek is highly correlated with DayOfWeekNumberHigh correlation
df_index is highly correlated with COLLISION_ID and 1 other fieldsHigh correlation
NUMBER OF PERSONS INJURED is highly correlated with NUMBER OF MOTORIST INJUREDHigh correlation
NUMBER OF PERSONS KILLED is highly correlated with NUMBER OF PEDESTRIANS KILLED and 1 other fieldsHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF MOTORIST INJURED is highly correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
COLLISION_ID is highly correlated with df_index and 1 other fieldsHigh correlation
Year is highly correlated with df_index and 1 other fieldsHigh correlation
DayOfWeekNumber is highly correlated with hourofweekHigh correlation
hourofday is highly correlated with timeofdaypercentHigh correlation
timeofdaypercent is highly correlated with hourofdayHigh correlation
hourofweek is highly correlated with DayOfWeekNumberHigh correlation
NUMBER OF PERSONS KILLED is highly correlated with NUMBER OF MOTORIST KILLED and 1 other fieldsHigh correlation
NUMBER OF MOTORIST KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
df_index is highly correlated with COLLISION_ID and 2 other fieldsHigh correlation
LATITUDE is highly correlated with LONGITUDEHigh correlation
LONGITUDE is highly correlated with LATITUDEHigh correlation
NUMBER OF PERSONS INJURED is highly correlated with NUMBER OF MOTORIST INJUREDHigh correlation
NUMBER OF PERSONS KILLED is highly correlated with NUMBER OF PEDESTRIANS KILLED and 1 other fieldsHigh correlation
NUMBER OF PEDESTRIANS KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
NUMBER OF MOTORIST INJURED is highly correlated with NUMBER OF PERSONS INJUREDHigh correlation
NUMBER OF MOTORIST KILLED is highly correlated with NUMBER OF PERSONS KILLEDHigh correlation
COLLISION_ID is highly correlated with df_index and 2 other fieldsHigh correlation
Year is highly correlated with df_index and 1 other fieldsHigh correlation
DayOfWeekNumber is highly correlated with DayOfWeek and 1 other fieldsHigh correlation
DayOfWeek is highly correlated with DayOfWeekNumber and 1 other fieldsHigh correlation
hourofday is highly correlated with timeofdaypercent and 1 other fieldsHigh correlation
timeofdaypercent is highly correlated with hourofday and 1 other fieldsHigh correlation
month is highly correlated with df_index and 1 other fieldsHigh correlation
hourofweek is highly correlated with DayOfWeekNumber and 3 other fieldsHigh correlation
BOROUGH has 196099 (35.3%) missing values Missing
LATITUDE has 41428 (7.4%) missing values Missing
LONGITUDE has 41428 (7.4%) missing values Missing
LOCATION has 41428 (7.4%) missing values Missing
VEHICLE TYPE CODE 2 has 125877 (22.6%) missing values Missing
LATITUDE is highly skewed (γ1 = -26.02064086) Skewed
df_index has unique values Unique
COLLISION_ID has unique values Unique
CRASH DATE is an unsupported type, check if it needs cleaning or further analysis Unsupported
NUMBER OF PERSONS INJURED has 432082 (77.7%) zeros Zeros
NUMBER OF PEDESTRIANS INJURED has 528873 (95.1%) zeros Zeros
NUMBER OF MOTORIST INJURED has 473844 (85.2%) zeros Zeros
DayOfWeekNumber has 79914 (14.4%) zeros Zeros
hourofday has 20154 (3.6%) zeros Zeros
minute has 111720 (20.1%) zeros Zeros
timeofdaypercent has 8482 (1.5%) zeros Zeros

Reproduction

Analysis started2022-05-08 15:36:54.881787
Analysis finished2022-05-08 15:40:00.949927
Duration3 minutes and 6.07 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct556205
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean397820.5303
Minimum13
Maximum688757
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:01.192624image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile103777.2
Q1262458
median410423
Q3549474
95-th percentile660789.8
Maximum688757
Range688744
Interquartile range (IQR)287016

Descriptive statistics

Standard deviation178624.1948
Coefficient of variation (CV)0.4490069797
Kurtosis-1.040195417
Mean397820.5303
Median Absolute Deviation (MAD)143508
Skewness-0.2198333476
Sum2.21269768 × 1011
Variance3.190660295 × 1010
MonotonicityStrictly increasing
2022-05-08T15:40:01.588840image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
81961
 
< 0.1%
4350731
 
< 0.1%
4473751
 
< 0.1%
4494221
 
< 0.1%
4432771
 
< 0.1%
4453241
 
< 0.1%
4555631
 
< 0.1%
4576101
 
< 0.1%
4514651
 
< 0.1%
4535121
 
< 0.1%
Other values (556195)556195
> 99.9%
ValueCountFrequency (%)
131
< 0.1%
141
< 0.1%
391
< 0.1%
931
< 0.1%
4231
< 0.1%
6621
< 0.1%
6901
< 0.1%
8471
< 0.1%
10441
< 0.1%
11451
< 0.1%
ValueCountFrequency (%)
6887571
< 0.1%
6887561
< 0.1%
6887551
< 0.1%
6887541
< 0.1%
6887531
< 0.1%
6887521
< 0.1%
6887511
< 0.1%
6887501
< 0.1%
6887491
< 0.1%
6887481
< 0.1%

CRASH DATE
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size4.2 MiB

CRASH TIME
Categorical

HIGH CARDINALITY

Distinct1440
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
0:00
 
8482
16:00
 
7813
17:00
 
7558
15:00
 
7323
14:00
 
7230
Other values (1435)
517799 

Length

Max length5
Median length5
Mean length4.734871136
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row22:50
2nd row15:49
3rd row16:30
4th row20:19
5th row15:20

Common Values

ValueCountFrequency (%)
0:008482
 
1.5%
16:007813
 
1.4%
17:007558
 
1.4%
15:007323
 
1.3%
14:007230
 
1.3%
18:006938
 
1.2%
13:006655
 
1.2%
12:006122
 
1.1%
9:006021
 
1.1%
8:005729
 
1.0%
Other values (1430)486334
87.4%

Length

2022-05-08T15:40:02.278862image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0:008482
 
1.5%
16:007813
 
1.4%
17:007558
 
1.4%
15:007323
 
1.3%
14:007230
 
1.3%
18:006938
 
1.2%
13:006655
 
1.2%
12:006122
 
1.1%
9:006021
 
1.1%
8:005729
 
1.0%
Other values (1430)486334
87.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

BOROUGH
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing196099
Missing (%)35.3%
Memory size4.2 MiB
BROOKLYN
117330 
QUEENS
101849 
MANHATTAN
68881 
BRONX
59829 
STATEN ISLAND
12217 

Length

Max length13
Median length8
Mean length7.296820936
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBROOKLYN
2nd rowMANHATTAN
3rd rowQUEENS
4th rowMANHATTAN
5th rowMANHATTAN

Common Values

ValueCountFrequency (%)
BROOKLYN117330
21.1%
QUEENS101849
18.3%
MANHATTAN68881
 
12.4%
BRONX59829
 
10.8%
STATEN ISLAND12217
 
2.2%
(Missing)196099
35.3%

Length

2022-05-08T15:40:02.657860image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:02.895876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn117330
31.5%
queens101849
27.4%
manhattan68881
18.5%
bronx59829
16.1%
staten12217
 
3.3%
island12217
 
3.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

LATITUDE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING
SKEWED

Distinct62407
Distinct (%)12.1%
Missing41428
Missing (%)7.4%
Infinite0
Infinite (%)0.0%
Mean40.66615177
Minimum0
Maximum41.12421
Zeros751
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:03.179325image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.598628
Q140.668224
median40.72028
Q340.77434
95-th percentile40.862823
Maximum41.12421
Range41.12421
Interquartile range (IQR)0.106116

Descriptive statistics

Standard deviation1.556441398
Coefficient of variation (CV)0.03827363373
Kurtosis676.8732272
Mean40.66615177
Median Absolute Deviation (MAD)0.0527
Skewness-26.02064086
Sum20933999.61
Variance2.422509826
MonotonicityNot monotonic
2022-05-08T15:40:03.560659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0751
 
0.1%
40.861862431
 
0.1%
40.696033375
 
0.1%
40.8047331
 
0.1%
40.798256293
 
0.1%
40.608757281
 
0.1%
40.75898247
 
< 0.1%
40.733536246
 
< 0.1%
40.820305242
 
< 0.1%
40.76229240
 
< 0.1%
Other values (62397)511340
91.9%
(Missing)41428
 
7.4%
ValueCountFrequency (%)
0751
0.1%
40.499841
 
< 0.1%
40.5000231
 
< 0.1%
40.5000841
 
< 0.1%
40.5014651
 
< 0.1%
40.501631
 
< 0.1%
40.5019871
 
< 0.1%
40.5021821
 
< 0.1%
40.502341
 
< 0.1%
40.5023961
 
< 0.1%
ValueCountFrequency (%)
41.124211
 
< 0.1%
40.9128846
< 0.1%
40.9124689
< 0.1%
40.912221
 
< 0.1%
40.912171
 
< 0.1%
40.9121171
 
< 0.1%
40.912086
< 0.1%
40.912061
 
< 0.1%
40.9120182
 
< 0.1%
40.9116671
 
< 0.1%

LONGITUDE
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct41779
Distinct (%)8.1%
Missing41428
Missing (%)7.4%
Infinite0
Infinite (%)0.0%
Mean-73.8168721
Minimum-201.23706
Maximum0
Zeros751
Zeros (%)0.1%
Negative514026
Negative (%)92.4%
Memory size4.2 MiB
2022-05-08T15:40:03.928407image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-201.23706
5-th percentile-74.02453
Q1-73.96899
median-73.92147
Q3-73.86239
95-th percentile-73.76111
Maximum0
Range201.23706
Interquartile range (IQR)0.1066

Descriptive statistics

Standard deviation3.023231539
Coefficient of variation (CV)-0.04095583371
Kurtosis742.4564494
Mean-73.8168721
Median Absolute Deviation (MAD)0.052834
Skewness15.86448813
Sum-37999227.97
Variance9.139928939
MonotonicityNot monotonic
2022-05-08T15:40:04.258669image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0751
 
0.1%
-73.91282447
 
0.1%
-73.98453396
 
0.1%
-73.91243339
 
0.1%
-73.89063333
 
0.1%
-73.882744310
 
0.1%
-73.89686291
 
0.1%
-73.89083284
 
0.1%
-74.038086281
 
0.1%
-73.91727267
 
< 0.1%
Other values (41769)511078
91.9%
(Missing)41428
 
7.4%
ValueCountFrequency (%)
-201.2370637
< 0.1%
-74.7423
 
< 0.1%
-74.253931
 
< 0.1%
-74.2530061
 
< 0.1%
-74.2528841
 
< 0.1%
-74.252181
 
< 0.1%
-74.252091
 
< 0.1%
-74.251881
 
< 0.1%
-74.251842
 
< 0.1%
-74.2514951
 
< 0.1%
ValueCountFrequency (%)
0751
0.1%
-32.7685132
 
< 0.1%
-73.700552
 
< 0.1%
-73.7005845
 
< 0.1%
-73.700712
 
< 0.1%
-73.700731
 
< 0.1%
-73.700761
 
< 0.1%
-73.700842
 
< 0.1%
-73.7009914
 
< 0.1%
-73.7010041
 
< 0.1%

LOCATION
Categorical

HIGH CARDINALITY
MISSING

Distinct122234
Distinct (%)23.7%
Missing41428
Missing (%)7.4%
Memory size4.2 MiB
(0.0, 0.0)
 
751
(40.861862, -73.91282)
 
429
(40.696033, -73.98453)
 
375
(40.8047, -73.91243)
 
312
(40.608757, -74.038086)
 
281
Other values (122229)
512629 

Length

Max length25
Median length22
Mean length21.71638982
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique70434 ?
Unique (%)13.7%

Sample

1st row(40.69754, -73.98312)
2nd row(40.671585, -73.99843)
3rd row(40.651974, -73.86542)
4th row(40.77161, -73.99046)
5th row(40.771038, -73.83413)

Common Values

ValueCountFrequency (%)
(0.0, 0.0)751
 
0.1%
(40.861862, -73.91282)429
 
0.1%
(40.696033, -73.98453)375
 
0.1%
(40.8047, -73.91243)312
 
0.1%
(40.608757, -74.038086)281
 
0.1%
(40.798256, -73.82744)257
 
< 0.1%
(40.733536, -73.87035)246
 
< 0.1%
(40.820305, -73.89083)242
 
< 0.1%
(40.675735, -73.89686)236
 
< 0.1%
(40.83801, -73.87329)233
 
< 0.1%
Other values (122224)511415
91.9%
(Missing)41428
 
7.4%

Length

2022-05-08T15:40:04.636244image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.01502
 
0.1%
73.91282447
 
< 0.1%
40.861862431
 
< 0.1%
73.98453396
 
< 0.1%
40.696033375
 
< 0.1%
73.91243339
 
< 0.1%
73.89063333
 
< 0.1%
40.8047331
 
< 0.1%
73.882744310
 
< 0.1%
40.798256293
 
< 0.1%
Other values (104175)1024797
99.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NUMBER OF PERSONS INJURED
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21
Distinct (%)< 0.1%
Missing5
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.302121539
Minimum0
Maximum22
Zeros432082
Zeros (%)77.7%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:04.992115image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum22
Range22
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6849383441
Coefficient of variation (CV)2.267095376
Kurtosis28.30558787
Mean0.302121539
Median Absolute Deviation (MAD)0
Skewness3.773722175
Sum168040
Variance0.4691405353
MonotonicityNot monotonic
2022-05-08T15:40:05.204451image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
0432082
77.7%
196100
 
17.3%
218409
 
3.3%
35965
 
1.1%
42219
 
0.4%
5837
 
0.2%
6319
 
0.1%
7136
 
< 0.1%
861
 
< 0.1%
925
 
< 0.1%
Other values (11)47
 
< 0.1%
ValueCountFrequency (%)
0432082
77.7%
196100
 
17.3%
218409
 
3.3%
35965
 
1.1%
42219
 
0.4%
5837
 
0.2%
6319
 
0.1%
7136
 
< 0.1%
861
 
< 0.1%
925
 
< 0.1%
ValueCountFrequency (%)
221
 
< 0.1%
202
 
< 0.1%
181
 
< 0.1%
173
 
< 0.1%
161
 
< 0.1%
152
 
< 0.1%
142
 
< 0.1%
133
 
< 0.1%
127
< 0.1%
119
< 0.1%

NUMBER OF PERSONS KILLED
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing14
Missing (%)< 0.1%
Memory size4.2 MiB
0.0
555470 
1.0
 
700
2.0
 
18
3.0
 
2
4.0
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0555470
99.9%
1.0700
 
0.1%
2.018
 
< 0.1%
3.02
 
< 0.1%
4.01
 
< 0.1%
(Missing)14
 
< 0.1%

Length

2022-05-08T15:40:05.512550image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:05.701089image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0.0555470
99.9%
1.0700
 
0.1%
2.018
 
< 0.1%
3.02
 
< 0.1%
4.01
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NUMBER OF PEDESTRIANS INJURED
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0510621084
Minimum0
Maximum7
Zeros528873
Zeros (%)95.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:05.894647image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum7
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2302640362
Coefficient of variation (CV)4.509489394
Kurtosis30.62087674
Mean0.0510621084
Median Absolute Deviation (MAD)0
Skewness4.886336249
Sum28401
Variance0.05302152638
MonotonicityNot monotonic
2022-05-08T15:40:06.160296image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0528873
95.1%
126403
 
4.7%
2828
 
0.1%
377
 
< 0.1%
415
 
< 0.1%
54
 
< 0.1%
64
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
0528873
95.1%
126403
 
4.7%
2828
 
0.1%
377
 
< 0.1%
415
 
< 0.1%
54
 
< 0.1%
64
 
< 0.1%
71
 
< 0.1%
ValueCountFrequency (%)
71
 
< 0.1%
64
 
< 0.1%
54
 
< 0.1%
415
 
< 0.1%
377
 
< 0.1%
2828
 
0.1%
126403
 
4.7%
0528873
95.1%

NUMBER OF PEDESTRIANS KILLED
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
0
555852 
1
 
350
2
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0555852
99.9%
1350
 
0.1%
23
 
< 0.1%

Length

2022-05-08T15:40:06.423962image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:06.618199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0555852
99.9%
1350
 
0.1%
23
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
0
541127 
1
 
14866
2
 
208
3
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0541127
97.3%
114866
 
2.7%
2208
 
< 0.1%
34
 
< 0.1%

Length

2022-05-08T15:40:06.759948image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:06.916981image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0541127
97.3%
114866
 
2.7%
2208
 
< 0.1%
34
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
0
556135 
1
 
70

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0556135
> 99.9%
170
 
< 0.1%

Length

2022-05-08T15:40:07.160005image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:07.324264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0556135
> 99.9%
170
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

NUMBER OF MOTORIST INJURED
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2235165092
Minimum0
Maximum21
Zeros473844
Zeros (%)85.2%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:07.501094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum21
Range21
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6500539542
Coefficient of variation (CV)2.908303984
Kurtosis36.48045492
Mean0.2235165092
Median Absolute Deviation (MAD)0
Skewness4.552156025
Sum124321
Variance0.4225701434
MonotonicityNot monotonic
2022-05-08T15:40:07.763881image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
0473844
85.2%
155963
 
10.1%
216998
 
3.1%
35821
 
1.0%
42187
 
0.4%
5822
 
0.1%
6312
 
0.1%
7129
 
< 0.1%
857
 
< 0.1%
925
 
< 0.1%
Other values (11)47
 
< 0.1%
ValueCountFrequency (%)
0473844
85.2%
155963
 
10.1%
216998
 
3.1%
35821
 
1.0%
42187
 
0.4%
5822
 
0.1%
6312
 
0.1%
7129
 
< 0.1%
857
 
< 0.1%
925
 
< 0.1%
ValueCountFrequency (%)
211
 
< 0.1%
202
 
< 0.1%
181
 
< 0.1%
173
 
< 0.1%
161
 
< 0.1%
152
 
< 0.1%
142
 
< 0.1%
133
 
< 0.1%
126
< 0.1%
1110
< 0.1%

NUMBER OF MOTORIST KILLED
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
0
555905 
1
 
284
2
 
13
3
 
2
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0555905
99.9%
1284
 
0.1%
213
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Length

2022-05-08T15:40:08.000403image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:08.194544image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0555905
99.9%
1284
 
0.1%
213
 
< 0.1%
32
 
< 0.1%
41
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CONTRIBUTING FACTOR VEHICLE 1
Categorical

HIGH CARDINALITY

Distinct55
Distinct (%)< 0.1%
Missing1942
Missing (%)0.3%
Memory size4.2 MiB
Driver Inattention/Distraction
140948 
Unspecified
128683 
Following Too Closely
48977 
Failure to Yield Right-of-Way
38325 
Backing Unsafely
24912 
Other values (50)
172418 

Length

Max length53
Median length21
Mean length21.45328842
Min length5

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPassing or Lane Usage Improper
2nd rowDriver Inattention/Distraction
3rd rowReaction to Uninvolved Vehicle
4th rowFollowing Too Closely
5th rowDriver Inattention/Distraction

Common Values

ValueCountFrequency (%)
Driver Inattention/Distraction140948
25.3%
Unspecified128683
23.1%
Following Too Closely48977
 
8.8%
Failure to Yield Right-of-Way38325
 
6.9%
Backing Unsafely24912
 
4.5%
Passing or Lane Usage Improper24084
 
4.3%
Passing Too Closely22589
 
4.1%
Unsafe Lane Changing17600
 
3.2%
Other Vehicular16500
 
3.0%
Turning Improperly12422
 
2.2%
Other values (45)79223
14.2%

Length

2022-05-08T15:40:08.403959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
driver149719
 
12.0%
inattention/distraction140948
 
11.3%
unspecified128683
 
10.3%
too71566
 
5.7%
closely71566
 
5.7%
following48977
 
3.9%
to48003
 
3.8%
passing46673
 
3.7%
lane41984
 
3.4%
failure39856
 
3.2%
Other values (93)464434
37.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

COLLISION_ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct556205
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4102031.994
Minimum3511951
Maximum4513071
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:08.779975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum3511951
5-th percentile3849157.2
Q13960658
median4103330
Q34242467
95-th percentile4353752.8
Maximum4513071
Range1001120
Interquartile range (IQR)281809

Descriptive statistics

Standard deviation162142.2421
Coefficient of variation (CV)0.03952729826
Kurtosis-1.204276867
Mean4102031.994
Median Absolute Deviation (MAD)140900
Skewness-0.0090812053
Sum2.281570705 × 1012
Variance2.629010667 × 1010
MonotonicityNot monotonic
2022-05-08T15:40:09.186100image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41943041
 
< 0.1%
38958671
 
< 0.1%
41212251
 
< 0.1%
41232721
 
< 0.1%
41007431
 
< 0.1%
41027901
 
< 0.1%
40966451
 
< 0.1%
40986921
 
< 0.1%
41089311
 
< 0.1%
41109781
 
< 0.1%
Other values (556195)556195
> 99.9%
ValueCountFrequency (%)
35119511
< 0.1%
35901871
< 0.1%
35910311
< 0.1%
36005671
< 0.1%
38186191
< 0.1%
38186201
< 0.1%
38186251
< 0.1%
38186371
< 0.1%
38186401
< 0.1%
38186411
< 0.1%
ValueCountFrequency (%)
45130711
< 0.1%
45120761
< 0.1%
45119101
< 0.1%
45112681
< 0.1%
44987301
< 0.1%
44980131
< 0.1%
44936201
< 0.1%
44917461
< 0.1%
44853381
< 0.1%
44831371
< 0.1%

VEHICLE TYPE CODE 1
Categorical

HIGH CARDINALITY

Distinct839
Distinct (%)0.2%
Missing3858
Missing (%)0.7%
Memory size4.2 MiB
Sedan
254180 
Station Wagon/Sport Utility Vehicle
203500 
Taxi
 
25281
Pick-up Truck
 
16057
Box Truck
 
11036
Other values (834)
42293 

Length

Max length38
Median length5
Mean length16.53787384
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique500 ?
Unique (%)0.1%

Sample

1st row�MBU
2nd rowSedan
3rd rowSedan
4th rowSedan
5th rowSedan

Common Values

ValueCountFrequency (%)
Sedan254180
45.7%
Station Wagon/Sport Utility Vehicle203500
36.6%
Taxi25281
 
4.5%
Pick-up Truck16057
 
2.9%
Box Truck11036
 
2.0%
Bus8668
 
1.6%
Bike5675
 
1.0%
Tractor Truck Diesel4424
 
0.8%
Van3513
 
0.6%
Motorcycle3044
 
0.5%
Other values (829)16969
 
3.1%
(Missing)3858
 
0.7%

Length

2022-05-08T15:40:09.817291image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan254521
21.1%
vehicle203543
16.8%
utility203511
16.8%
station203500
16.8%
wagon/sport203500
16.8%
truck33241
 
2.8%
taxi25281
 
2.1%
pick-up16058
 
1.3%
box11085
 
0.9%
bus8769
 
0.7%
Other values (538)45407
 
3.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

VEHICLE TYPE CODE 2
Categorical

HIGH CARDINALITY
MISSING

Distinct890
Distinct (%)0.2%
Missing125877
Missing (%)22.6%
Memory size4.2 MiB
Sedan
184957 
Station Wagon/Sport Utility Vehicle
152883 
Taxi
19015 
Pick-up Truck
 
14931
Box Truck
 
12418
Other values (885)
46124 

Length

Max length38
Median length5
Mean length16.25627428
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique531 ?
Unique (%)0.1%

Sample

1st rowTaxi
2nd rowStation Wagon/Sport Utility Vehicle
3rd rowSedan
4th rowTractor Truck Diesel
5th rowSedan

Common Values

ValueCountFrequency (%)
Sedan184957
33.3%
Station Wagon/Sport Utility Vehicle152883
27.5%
Taxi19015
 
3.4%
Pick-up Truck14931
 
2.7%
Box Truck12418
 
2.2%
Bike11773
 
2.1%
Bus7760
 
1.4%
Tractor Truck Diesel4375
 
0.8%
Van3275
 
0.6%
Motorcycle2363
 
0.4%
Other values (880)16578
 
3.0%
(Missing)125877
22.6%

Length

2022-05-08T15:40:10.157380image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan185201
19.8%
vehicle152922
16.4%
utility152890
16.4%
station152883
16.4%
wagon/sport152883
16.4%
truck33387
 
3.6%
taxi19016
 
2.0%
pick-up14932
 
1.6%
box12474
 
1.3%
bike11779
 
1.3%
Other values (562)46014
 
4.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct1097
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
Minimum2018-01-01 00:00:00
Maximum2021-01-01 00:00:00
2022-05-08T15:40:10.543231image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:40:10.941289image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Year
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
2018
231563 
2019
211485 
2020
112900 
2021
 
257

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2018231563
41.6%
2019211485
38.0%
2020112900
20.3%
2021257
 
< 0.1%

Length

2022-05-08T15:40:11.228205image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:11.391701image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2018231563
41.6%
2019211485
38.0%
2020112900
20.3%
2021257
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Time
Date

Distinct1440
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
Minimum1900-01-01 00:00:00
Maximum1900-01-01 23:59:00
2022-05-08T15:40:11.708811image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:40:12.173654image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

DayOfWeekNumber
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.905658885
Minimum0
Maximum6
Zeros79914
Zeros (%)14.4%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:12.504256image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.942335525
Coefficient of variation (CV)0.6684664657
Kurtosis-1.184752032
Mean2.905658885
Median Absolute Deviation (MAD)2
Skewness0.03410297032
Sum1616142
Variance3.772667292
MonotonicityNot monotonic
2022-05-08T15:40:12.724202image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
489013
16.0%
384092
15.1%
182483
14.8%
281105
14.6%
079914
14.4%
574467
13.4%
665131
11.7%
ValueCountFrequency (%)
079914
14.4%
182483
14.8%
281105
14.6%
384092
15.1%
489013
16.0%
574467
13.4%
665131
11.7%
ValueCountFrequency (%)
665131
11.7%
574467
13.4%
489013
16.0%
384092
15.1%
281105
14.6%
182483
14.8%
079914
14.4%

DayOfWeek
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
Friday
89013 
Thursday
84092 
Tuesday
82483 
Wednesday
81105 
Monday
79914 
Other values (2)
139598 

Length

Max length9
Median length7
Mean length7.155897556
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTuesday
2nd rowTuesday
3rd rowThursday
4th rowFriday
5th rowWednesday

Common Values

ValueCountFrequency (%)
Friday89013
16.0%
Thursday84092
15.1%
Tuesday82483
14.8%
Wednesday81105
14.6%
Monday79914
14.4%
Saturday74467
13.4%
Sunday65131
11.7%

Length

2022-05-08T15:40:12.991954image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-05-08T15:40:13.182053image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
friday89013
16.0%
thursday84092
15.1%
tuesday82483
14.8%
wednesday81105
14.6%
monday79914
14.4%
saturday74467
13.4%
sunday65131
11.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

hourofday
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.14288077
Minimum0
Maximum23
Zeros20154
Zeros (%)3.6%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:13.409503image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q19
median14
Q317
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.760881449
Coefficient of variation (CV)0.4383271483
Kurtosis-0.4082702086
Mean13.14288077
Median Absolute Deviation (MAD)4
Skewness-0.4445238765
Sum7310136
Variance33.18775507
MonotonicityNot monotonic
2022-05-08T15:40:13.605904image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1639694
 
7.1%
1739205
 
7.0%
1437817
 
6.8%
1535932
 
6.5%
1834207
 
6.2%
1332317
 
5.8%
1230785
 
5.5%
830279
 
5.4%
928945
 
5.2%
1128795
 
5.2%
Other values (14)218229
39.2%
ValueCountFrequency (%)
020154
3.6%
19229
 
1.7%
26848
 
1.2%
36121
 
1.1%
46803
 
1.2%
57773
 
1.4%
613039
2.3%
718275
3.3%
830279
5.4%
928945
5.2%
ValueCountFrequency (%)
2315109
 
2.7%
2217846
3.2%
2119590
3.5%
2022723
4.1%
1927633
5.0%
1834207
6.2%
1739205
7.0%
1639694
7.1%
1535932
6.5%
1437817
6.8%

minute
Real number (ℝ≥0)

ZEROS

Distinct60
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.26036623
Minimum0
Maximum59
Zeros111720
Zeros (%)20.1%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:13.891171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q17
median25
Q340
95-th percentile54
Maximum59
Range59
Interquartile range (IQR)33

Descriptive statistics

Standard deviation18.12438195
Coefficient of variation (CV)0.7470778379
Kurtosis-1.238250639
Mean24.26036623
Median Absolute Deviation (MAD)15
Skewness0.09370039452
Sum13493737
Variance328.4932211
MonotonicityNot monotonic
2022-05-08T15:40:14.168425image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0111720
20.1%
3074354
13.4%
4532750
 
5.9%
1531420
 
5.6%
5030419
 
5.5%
2029812
 
5.4%
4028182
 
5.1%
1022176
 
4.0%
2514883
 
2.7%
3514620
 
2.6%
Other values (50)165869
29.8%
ValueCountFrequency (%)
0111720
20.1%
12483
 
0.4%
22703
 
0.5%
32777
 
0.5%
42817
 
0.5%
513674
 
2.5%
62873
 
0.5%
72839
 
0.5%
83422
 
0.6%
92805
 
0.5%
ValueCountFrequency (%)
592296
 
0.4%
583402
 
0.6%
572920
 
0.5%
562795
 
0.5%
5513894
2.5%
542936
 
0.5%
533036
 
0.5%
522812
 
0.5%
512447
 
0.4%
5030419
5.5%

timeofdaypercent
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1440
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.54722021
Minimum0
Maximum23.98333333
Zeros8482
Zeros (%)1.5%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:14.497258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.75
Q19.583333333
median14.25
Q317.9
95-th percentile22.16666667
Maximum23.98333333
Range23.98333333
Interquartile range (IQR)8.316666667

Descriptive statistics

Standard deviation5.774132212
Coefficient of variation (CV)0.4262226584
Kurtosis-0.3782311866
Mean13.54722021
Median Absolute Deviation (MAD)4.083333333
Skewness-0.4525256113
Sum7535031.617
Variance33.3406028
MonotonicityNot monotonic
2022-05-08T15:40:14.911367image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
08482
 
1.5%
167813
 
1.4%
177558
 
1.4%
157323
 
1.3%
147230
 
1.3%
186938
 
1.2%
136655
 
1.2%
126122
 
1.1%
96021
 
1.1%
85729
 
1.0%
Other values (1430)486334
87.4%
ValueCountFrequency (%)
08482
1.5%
0.01666666667431
 
0.1%
0.03333333333130
 
< 0.1%
0.05121
 
< 0.1%
0.0666666666799
 
< 0.1%
0.08333333333768
 
0.1%
0.198
 
< 0.1%
0.116666666784
 
< 0.1%
0.133333333397
 
< 0.1%
0.1599
 
< 0.1%
ValueCountFrequency (%)
23.9833333376
 
< 0.1%
23.9666666791
 
< 0.1%
23.9575
 
< 0.1%
23.9333333384
 
< 0.1%
23.91666667450
0.1%
23.980
 
< 0.1%
23.8833333392
 
< 0.1%
23.8666666783
 
< 0.1%
23.8568
 
< 0.1%
23.83333333958
0.2%

month
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
January
49674 
October
48141 
March
48110 
June
47953 
July
47404 
Other values (7)
314923 

Length

Max length9
Median length7
Mean length6.164412402
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMay
2nd rowJanuary
3rd rowDecember
4th rowDecember
5th rowApril

Common Values

ValueCountFrequency (%)
January49674
8.9%
October48141
8.7%
March48110
8.6%
June47953
8.6%
July47404
8.5%
August46677
8.4%
May46594
8.4%
September46387
8.3%
February45759
8.2%
November45417
8.2%
Other values (2)84089
15.1%

Length

2022-05-08T15:40:15.281238image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
january49674
8.9%
october48141
8.7%
march48110
8.6%
june47953
8.6%
july47404
8.5%
august46677
8.4%
may46594
8.4%
september46387
8.3%
february45759
8.2%
november45417
8.2%
Other values (2)84089
15.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

hourofweek
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct168
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.87869401
Minimum0
Maximum167
Zeros2693
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-05-08T15:40:15.623581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11
Q141
median84
Q3120
95-th percentile159
Maximum167
Range167
Interquartile range (IQR)79

Descriptive statistics

Standard deviation46.93774189
Coefficient of variation (CV)0.5663426826
Kurtosis-1.148587267
Mean82.87869401
Median Absolute Deviation (MAD)41
Skewness0.02702035481
Sum46097544
Variance2203.151614
MonotonicityNot monotonic
2022-05-08T15:40:15.928299image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1126490
 
1.2%
1136397
 
1.2%
896259
 
1.1%
416207
 
1.1%
886206
 
1.1%
406081
 
1.1%
1106067
 
1.1%
656034
 
1.1%
1116021
 
1.1%
165961
 
1.1%
Other values (158)494482
88.9%
ValueCountFrequency (%)
02693
0.5%
11081
 
0.2%
2746
 
0.1%
3616
 
0.1%
4713
 
0.1%
51054
 
0.2%
62200
0.4%
73140
0.6%
85086
0.9%
94806
0.9%
ValueCountFrequency (%)
1671981
0.4%
1662508
0.5%
1652679
0.5%
1643007
0.5%
1633300
0.6%
1623861
0.7%
1614013
0.7%
1604241
0.8%
1594025
0.7%
1584452
0.8%

Interactions

2022-05-08T15:39:43.665501image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:47.529886image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:53.113402image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:57.765581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:03.020065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:08.014083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:13.207866image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:18.023849image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:22.934085image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:28.211870image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:32.986616image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:38.041792image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:44.159659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:48.073487image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:53.552054image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:58.272529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:03.404124image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:08.511034image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:13.626409image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:18.416181image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:23.342366image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:28.658489image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:33.351175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:38.523237image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:44.579131image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:48.544407image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:54.005667image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:58.690715image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:03.786980image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:08.903941image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:13.996071image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:18.774977image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:23.722411image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:29.117700image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:33.641905image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:38.937036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:45.046736image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:49.036993image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:54.396991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:59.129831image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:04.267640image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:09.413835image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:14.645956image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:19.201453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:24.170506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:29.576388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:34.199340image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:39.408346image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:45.410443image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:49.550663image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:54.792014image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:59.489636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:04.663796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:09.884863image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:15.018584image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:19.553919image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:24.651197image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:29.939282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:34.623949image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:39.894326image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:45.782387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:50.098130image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:55.159566image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:59.846980image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:05.135660image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:10.358991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:15.395891image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:19.998218image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:25.172996image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:30.368095image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:35.035458image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:40.358659image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:46.235747image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:50.553632image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:55.533259image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:00.229528image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:05.529890image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:10.822434image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:15.835864image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:20.385083image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:25.631017image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:30.800523image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:35.432158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:40.790552image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:46.712093image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:51.031084image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:55.918856image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:00.938082image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:05.933436image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:11.202273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:16.209514image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:20.748398image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:26.031049image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:31.260527image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:35.801349image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:41.485961image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:47.100171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:51.442861image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:56.294823image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:01.345305image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:06.389345image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:11.649975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:16.555388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:21.230142image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:26.528119image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:31.693785image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:36.155284image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:41.930737image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:47.506264image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:51.895819image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:56.640707image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:01.751971image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:06.778462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:12.051953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:16.942692image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:21.649620image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:26.903784image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:32.079048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:36.590168image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:42.351130image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:47.924958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:52.280471image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:56.966873image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:02.181394image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:07.155183image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:12.392061image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:17.271389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:22.092230image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:27.248408image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:32.403152image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:37.062052image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:42.742284image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:48.378286image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:52.657475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:38:57.329743image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:02.605185image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:07.603694image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:12.828840image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:17.604464image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:22.511894image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:27.810802image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:32.690171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:37.523699image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-05-08T15:39:43.172678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-05-08T15:40:16.212481image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-08T15:40:17.031968image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-08T15:40:17.664431image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-08T15:40:18.216896image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-08T15:40:18.631944image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-08T15:39:49.298830image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-08T15:39:53.493206image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-08T15:39:58.228503image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-08T15:39:59.254673image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexCRASH DATECRASH TIMEBOROUGHLATITUDELONGITUDELOCATIONNUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2DateTimeYearTimeDayOfWeekNumberDayOfWeekhourofdayminutetimeofdaypercentmonthhourofweek
0132019-05-2122:50BROOKLYN40.697540-73.98312(40.69754, -73.98312)0.00.0000000Passing or Lane Usage Improper4136992�MBUTaxi2019-05-2120191900-01-01 22:50:001Tuesday225022.833333May46
1142020-01-2115:49NaNNaNNaNNaN0.00.0000000Driver Inattention/Distraction4277087SedanStation Wagon/Sport Utility Vehicle2020-01-2120201900-01-01 15:49:001Tuesday154915.816667January39
2392020-12-3116:30NaNNaNNaNNaN0.00.0000000Reaction to Uninvolved Vehicle4380668SedanNaN2020-12-3120201900-01-01 16:30:003Thursday163016.500000December88
3932020-12-2520:19NaNNaNNaNNaN0.00.0000000Following Too Closely4380494SedanSedan2020-12-2520201900-01-01 20:19:004Friday201920.316667December116
44232020-04-1515:20NaN40.671585-73.99843(40.671585, -73.99843)0.00.0000000Driver Inattention/Distraction4407790SedanTractor Truck Diesel2020-04-1520201900-01-01 15:20:002Wednesday152015.333333April63
56622020-10-252:00NaNNaNNaNNaN0.00.0000000Driver Inattention/Distraction4360880SedanNaN2020-10-2520201900-01-01 02:00:006Sunday202.000000October146
66902020-11-1116:33NaNNaNNaNNaN0.00.0000000Unspecified4387870Station Wagon/Sport Utility VehicleNaN2020-11-1120201900-01-01 16:33:002Wednesday163316.550000November64
78472019-04-170:49NaN40.651974-73.86542(40.651974, -73.86542)3.00.0000030Following Too Closely4408571Station Wagon/Sport Utility VehicleSedan2019-04-1720191900-01-01 00:49:002Wednesday0490.816667April48
810442020-04-171:50MANHATTAN40.771610-73.99046(40.77161, -73.99046)1.00.0000010Driver Inattention/Distraction4408441SedanNaN2020-04-1720201900-01-01 01:50:004Friday1501.833333April97
911452020-12-187:00NaNNaNNaNNaN0.00.0000000Pavement Slippery4377115Station Wagon/Sport Utility VehicleNaN2020-12-1820201900-01-01 07:00:004Friday707.000000December103

Last rows

df_indexCRASH DATECRASH TIMEBOROUGHLATITUDELONGITUDELOCATIONNUMBER OF PERSONS INJUREDNUMBER OF PERSONS KILLEDNUMBER OF PEDESTRIANS INJUREDNUMBER OF PEDESTRIANS KILLEDNUMBER OF CYCLIST INJUREDNUMBER OF CYCLIST KILLEDNUMBER OF MOTORIST INJUREDNUMBER OF MOTORIST KILLEDCONTRIBUTING FACTOR VEHICLE 1COLLISION_IDVEHICLE TYPE CODE 1VEHICLE TYPE CODE 2DateTimeYearTimeDayOfWeekNumberDayOfWeekhourofdayminutetimeofdaypercentmonthhourofweek
5561956887482018-01-1117:29BROOKLYN40.615944-73.975640(40.615944, -73.97564)1.00.0001000Driver Inattention/Distraction3828035SedanBike2018-01-1120181900-01-01 17:29:003Thursday172917.483333January89
5561966887492018-01-2118:27NaN40.741660-73.735886(40.74166, -73.735886)0.00.0000000Passing or Lane Usage Improper3832823SedanStation Wagon/Sport Utility Vehicle2018-01-2120181900-01-01 18:27:006Sunday182718.450000January162
5561976887502018-01-1918:30NaN40.738487-73.989960(40.738487, -73.98996)0.00.0000000Passing or Lane Usage Improper3831311Pick-up TruckStation Wagon/Sport Utility Vehicle2018-01-1920181900-01-01 18:30:004Friday183018.500000January114
5561986887512018-01-0511:00NaN40.791435-73.850440(40.791435, -73.85044)0.00.0000000Passing Too Closely3823209SedanNaN2018-01-0520181900-01-01 11:00:004Friday11011.000000January107
5561996887522018-01-1317:30QUEENS40.713820-73.920770(40.71382, -73.92077)0.00.0000000Pavement Slippery3828610Station Wagon/Sport Utility VehicleSedan2018-01-1320181900-01-01 17:30:005Saturday173017.500000January137
5562006887532018-01-2622:36BROOKLYN40.646640-73.924600(40.64664, -73.9246)0.00.0000000Unspecified3837602SedanStation Wagon/Sport Utility Vehicle2018-01-2620181900-01-01 22:36:004Friday223622.600000January118
5562016887542018-01-1415:00NaN40.677483-73.930330(40.677483, -73.93033)1.00.0000010Driver Inattention/Distraction3830100Station Wagon/Sport Utility VehicleSedan2018-01-1420181900-01-01 15:00:006Sunday15015.000000January159
5562026887552018-01-1412:20MANHATTAN40.867580-73.918420(40.86758, -73.91842)1.00.0000010Driver Inexperience3830852SedanStation Wagon/Sport Utility Vehicle2018-01-1420181900-01-01 12:20:006Sunday122012.333333January156
5562036887562018-01-236:30NaN40.832764-73.945830(40.832764, -73.94583)1.00.0100000Failure to Yield Right-of-Way3834124TaxiNaN2018-01-2320181900-01-01 06:30:001Tuesday6306.500000January30
5562046887572018-01-175:49BROOKLYN40.725880-73.941696(40.72588, -73.941696)0.00.0000000Failure to Yield Right-of-Way3829741SedanSedan2018-01-1720181900-01-01 05:49:002Wednesday5495.816667January53